{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "# Reference: https://jupyterbook.org/interactive/hiding.html\n",
    "# Use {hide, remove}-{input, output, cell} tags to hiding content\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "%matplotlib inline\n",
    "import ipywidgets as widgets\n",
    "from ipywidgets import interact, interactive, fixed, interact_manual\n",
    "from IPython.display import display\n",
    "\n",
    "sns.set()\n",
    "sns.set_context('talk')\n",
    "np.set_printoptions(threshold=20, precision=2, suppress=True)\n",
    "pd.set_option('display.max_rows', 7)\n",
    "pd.set_option('display.max_columns', 8)\n",
    "pd.set_option('precision', 2)\n",
    "# This option stops scientific notation for pandas\n",
    "# pd.set_option('display.float_format', '{:.2f}'.format)\n",
    "\n",
    "def display_df(df, rows=pd.options.display.max_rows,\n",
    "               cols=pd.options.display.max_columns):\n",
    "    with pd.option_context('display.max_rows', rows,\n",
    "                           'display.max_columns', cols):\n",
    "        display(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercises"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- In the {ref}`ch:pa_collocated` section, we used an approximation to find AQS and\n",
    "  PurpleAir sensors within 50 meters of each other. Geospatial data appears in\n",
    "  all kinds of domains and data scientists have a variety of tools for working\n",
    "  with this kind of data. One such tool is the `geopandas` package ([link][gpd]).\n",
    "  Use the `geopandas` package to create a map of the US with the AQS sites marked.\n",
    "  \n",
    "[gpd]: https://geopandas.org/\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Use a `geopandas` spatial join to find the closest PurpleAir sensor to each AQS sensor. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Although our data cleaning process closely followed Barkjohn's, we had to\n",
    "  omit some steps for brevity. Read Section 3 (Quality assurance) of BarkJohn's\n",
    "  paper, and note down all the additional steps that the original analysis took\n",
    "  that we did not include in this chapter.\n",
    "  Which steps might be most important to include?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Barkjohn's paper also distinguishes AQS sensors by whether they are\n",
    "  FRM (Federal Reference Method) or FEM (Federal Equivalent Method).\n",
    "  Do some research of your own to answer: what's the difference between\n",
    "  these two types of sensors? Which type is more accurate, if any?\n",
    "  Why did Barkjohn decide to include both types of sensors in their analysis?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- When we analyzed the PurpleAir data, we pointed out that PurpleAir sensors\n",
    "  apply two different types of corrections on the raw laser readings. One\n",
    "  correction is named CF1, and the other is named ATM.\n",
    "  Conduct your own EDA to find out how these two corrections differ in the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- In {numref}`ch:pa_modeling`, we wrote Model 2 as:\n",
    "\n",
    "  $$\n",
    "    \\begin{aligned}\n",
    "        f_{\\theta}(x_i) = \\text{PA}_i + \\theta\n",
    "    \\end{aligned}\n",
    "  $$\n",
    "  \n",
    "  Derive that $ \\hat{\\theta} = \\frac{1}{n} \\sum_i(\\text{AQS}_i - \\text{PA}_i) $\n",
    "  is the value for $ \\theta $ that minimizes the mean squared loss."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Consider the simple linear model without the intercept term. That is:\n",
    "\n",
    "  $$\n",
    "    \\begin{aligned}\n",
    "        f_{\\theta}(x_i) = \\theta \\cdot \\text{PA}_i\n",
    "    \\end{aligned}\n",
    "  $$\n",
    "  \n",
    "  Derive $ \\hat{\\theta} $, the model parameter that minimizes the mean squared\n",
    "  loss. Then, fit this model on the data and compare the test set RMSE against\n",
    "  the other models. How does it compare?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- (Needs background from Chapter {numref}`%s <ch:linear>`.) For Model 3,\n",
    "  we fit a calibration model, then inverted it to find the prediction model.\n",
    "  Fit a prediction model directly, *without* fitting a calibration model.\n",
    "  You might be surprised to see that the RMSE of this model\n",
    "  is lower than using Model 3.\n",
    "  Why will the training set RMSE of the direct linear regression model\n",
    "  *always* be lower than inverting a calibration model?\n",
    "  Why might we prefer the calibration model anyway?"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Tags",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
